Multi-level Exemplar-Based Duration Generation for Expressive Speech Synthesis

نویسندگان

  • Mohamed Abou-Zleikha
  • Éva Székely
  • Peter Cahill
  • Julie Carson-Berndsen
چکیده

The generation of duration of speech units from linguistic information, as one component of a prosody model, is considered to be a requirement for natural sounding speech synthesis. This paper investigates the use of a multi-level exemplar-based model for duration generation for the purposes of expressive speech synthesis. The multi-level exemplar-based model has been proposed in the literature as a cognitive model for the production of duration. The implementation of this model for duration generation for speech synthesis is not straightforward and requires a set of modifications to the model and that the linguistically related units and the context of the target units should be taken into consideration. The work presented in this paper implements this model and presents a solution to these issues through the use of prosodic-syntactic correlated data, full context information of the input example and corpus exemplars.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Detecting affective states from text based on a multi-component emotion model

A multi-component emotion model is proposed to describe the affective states comprehensively and provide more details about emotion for the application of expressive speech synthesis. Four types of components from different perspectives – cognitive appraisal, psychological feeling, physical response and utterance manner are involved. And the interactions among them are also considered, by which...

متن کامل

Perceptions of emotions in expressive storytelling

Whereas experimental studies on emotional speech often control for neutral semantics, speech in naturalistic speech corpora is characterized by contextual cues and non-neutral semantic content. Moreover, the target emotion of an utterance is generally unknown and must be inferred by the listener. Within the context of having child-directed expressive text-to-speech synthesis as goal, we describ...

متن کامل

Hierarchical stress modeling and generation in mandarin for expressive Text-to-Speech

Expressive speech synthesis has received increased attention in recent times. Stress (or pitch accent) is the perceptual prominence within words or utterances, which contributes to the expressivity of speech. This paper summarizes our contribution to Mandarin expressive speech synthesis. A novel hierarchical stress modeling and generation method for Mandarin is proposed and further integrated i...

متن کامل

Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis

In this paper, we describe an HMM-based speech synthesis system in which spectrum, pitch and state duration are modeled simultaneously in a unified framework of HMM. In the system, pitch and state duration are modeled by multi-space probability distribution HMMs and multi-dimensional Gaussian distributions, respectively. The distributions for spectral parameter, pitch parameter and the state du...

متن کامل

Duration prediction using multi-level model for GPR-based speech synthesis

This paper introduces frame-based Gaussian process regression (GPR) into phone/syllable duration modeling for Thai speech synthesis. The GPR model is designed for predicting framelevel acoustic features using corresponding frame information, which includes relative position in each unit of utterance structure and linguistic information such as tone type and part of speech. Although the GPR-base...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012